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We consider the •problem of the transmission of discrete-time analog data 
With a variety of fidelity criteria. The outputs of the analog source are as- 
sumed to belong to a bounded set. Bounds on the minimum achievable average 
distortion for memoryless sources are derived both for the case where the 
coding delay is infinite (an extension of the Shannon Theory) and also for 
some cases where the coding delay is finite. Several examples are given, for 
which the upper and lower bounds coincide. 

Further, we discuss the case where the assumption of the existence of a 
probabilistic model for the source is dropped. We adopt as our fidelity 
criterion the supremum over all possible source-output n-sequences x, of 
the conditional expectation of the distortion given x ("guaranteed distor- 
tion"). The Shannon Theory is not directly applicable in determining the 
minimum guaranteed distortion. We do obtain results for two important 
cases. Some generalizations and applications are also discussed. 

I. INTRODUCTION 

111 this paper we are concerned with communication of discrete-time 
analog data over a communication channel with a variety of fidelity 
criteria. The central assumption about the analog source is that its 
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outputs belong to a bounded set, typically the interval [—A/2, A/2]. 
We begin with a rough outline of our results, leaving the precise formula- 
tion and statement to Section II. Proofs are found in Section III. 

Suppose that we have a data source which emits a sequence of sym- 
bols Xi , x 2 , • ■ ■ £ 9C (an arbitrary set) at a rate of p s per second. This 
sequence is fed into an "encoder" which assigns to each successive block 
of n source symbols, say x = {x x , x 2 , • • • , x n ), a channel input of dura- 
tion n/ps = T seconds. At the receiving end of the channel, the T- 
second output is transformed by a "decoder" into an w-sequence, say 
x = (4)i , x 2 , ■ • • , x n ), which is delivered to the destination. The "dis- 
tortion" between the source output sequence x and the received sequence 
x is defined as d {n) (x, x) = n~ l £)*-i d(x k , x k ), where d(x, A) ^ is 
an arbitrary function. 

The classical problem is that of a "memoryless" source, where suc- 
cessive source outputs are statistically independent with identical 
probability distribution. In this case it is meaningful to let the system 
performance criterion (fidelity criterion) be the statistical expectation 
of the distortion d (n) (x, x). A quantity of interest is d*(T), the smallest 
attainable value of the fidelity criterion when the coding delay is T 
seconds. The Shannon Theory gives the asymptotic behavior of d*(T) 
as T — » oo . In many cases this limit is difficult to evaluate analytically. 
Theorem 1 (in Section 2.2) considers the case where the source output set 
9C = [—A/2, A/2], and the function d(x, £) depends only on the dif- 
ference £ — x. This theorem gives a lower bound on limit r _oo d*(T). 
The examples which follow this theorem illustrate the applicability 
and utility of the bound. 

There are two cases in which we are particularly interested. In the 
first, the source set 9C = {0, 1, ■ • • , K — 1} with a uniform distribution, 
and d(x, £) = or 1 according as x = £ or x 7^ 45. Thus the fidelity 
criterion is the error-rate. For this case let d*(T) = P e (K, T). In the 
second case, 9C = [-A/2, A/2] with a uniform distribution, and d(x, £) = 
or 1 according sls \ x — £\<8or\x — i|^ S (where 8 > 0). In 
this case let d*(T) = Q(T, A, 8). It turns out that P e and Q are inti- 
mately related. In fact it is a consequence of Theorem 2 (Section 2.2) 
that if A/{28) = K , an integer, then Q(T, A, 8) = P.(T, K ). This 
result is valid for all values of the delay parameter T. From this result 
it can be deduced that the optimal encoder for the analog source 9C = 
[—A/2, A/2] is a "uniform" quantizer followed by an optimal "digital" 
encoder. This is the only known case for which analog-to-digital con- 
version is known to be optimal for finite T for the transmission of analog 
data from a memoryless source. 

We now drop the assumption of a memoryless source. In fact we do 
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not even assume that there is a probabilistic model for the source. 
Instead of the expectation of the distortion, we adopt as our fidelity 
criterion, the supremum, over all possible source output n-sequences x, 
of the conditional expectation of the distortion given x. We call this 
criterion the "guaranteed distortion". Let d*(T) be the minimum 
attainable guaranteed distortion for a system with delay parameter T. 
The Shannon Theory is not directly applicable in determining d*(T). 
We do obtain results for the two interesting cases discussed below. 

In the first, EC = [0,1, • • • ,K — 1} and d(x, £) = or 1, respectively, 
when x = £ or x 9* £. For this case let d*(T) = P.(T, K). It is a conse- 
quence of Theorem 3 (Section 2.3) that limit T -.„ P.{T, K) = limitr-,*, 
P,(T, K), which is known from the Shannon Theory. 

In the second case, DC = [—A/2, A/2] and d(x, £) = or 1, respect- 
ively, when \ x — £\ < 8 or \ x — x\ ^ 5. For this case, let d*(T) = 
Q(T, A, 8). Theorem 4 (Section 2.3) relates P. and Q by 

Q(T, A, 8) = P.(T, M), 

where M is the unique integer satisfying (M — 1) ^ A/(28) < M. 
Here too, we can deduce the optimality of analog-to-digital conversion. 
Theorem 4 is generalized by Theorem 5 (Section 2.4) to apply to an 
arbitrary set EC with a distance-like measure defined on it (replacing 
| a; - £ |). 

In Section 2.5, we give some applications of the above results. In 
particular we obtain some results for the distortion d(x, £) = \ x — £ |". 

In order to state our results completely and precisely, it is unfor- 
tunately necessary to give a rather large collection of definitions and 
to introduce a large number of S3 r mbols. In order to ease the reader's 
burden somewhat, we have included a glossary of symbols in the ap- 
pendix. 

II. STATEMENT OP THE PROBLEM AND PRINCIPAL RESULTS 

In Section 2.1 we define a "channel" (and its "capacity") in a very 
general and abstract way. We do this because the nature of the channel 
does not figure explicitly in our results (except for the channel capacity), 
and we want our results to apply as broadly as possible. In Section 2.2 
we describe the communication system which we shall consider, and 
state our results for the case of a "memory less" information source. 
The remainder of the results follows in Sections 2.3-2.5. 

2.1 Channel and Channel Capacity 

A channel is defined as follows. For every T > we have a set "W r 
of "allowable" inputs and a set d T of possible outputs. Every T 
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seconds some w e "W t is transmitted through the channel, and the chan- 
nel output z is a member of d T . The output is related to the input w t V? T 
by a probability measure n w on the set d T . Thus given that w t V? T is 
transmitted, the probability that zzB [where B is a (measurable) subset 
of 3> r ] is Hw(B). For example W r and S r may be the set of binary se- 
quences of length [T]~ + . The measure n„ is then a discrete conditional 
probability distribution. Another example is the case where "W T and 
d T are sets of real valued functions defined on the interval [0, T], and 
the members of V? T have "energy" not exceeding PT. 

With T specified, a block code with parameter N is a set of N pairs 
{(iv { , 5,)}f =1 , where xv { t W r are called code words and the 
collection of B { is a set of disjoint (measurable) subsets of d T called 
decoding sets. If code word w { (l ^ i S N) is transmitted, the resulting 
error probability is 

X,- = Pr {z i Bi | Wi is transmitted) = 1 — n wi (Bi). (1) 

The word error probability for the code is 

X = max X, . (2) 

ISiS'V 

Let X*(T, N) be the smallest attainable word error probability for a 
code with parameters T and N. The channel capacity C is denned as 
the supremum of those numbers R ^ 0, for which 

\*(T, [e BT D -* 0, as T->«>. 

Let us define the average word error probability by 

X = ^ t X* • (3) 

Thus X is the resulting average error probability which results when 
each of the N code words are equally likely to be transmitted. Let us 
define \*(T, N) as the smallest attainable value of X for a code with 
parameters T and N. Since X ^ X for any code, it follows from the 
above definition of channel capacity that for any R < C, 

X*(T,[e RT ]-)-^0, as T^<». 

Further it is known that for a large class of channels including the mem- 
oryless gaussian channel and discrete memoryless channels, 1 

H*(T, [e CT D -> h as T-»co. (4) 

[It is also true that for many of these same channels if R > C, 

t Throughout this paper we denote by [xY and [x] + the largest integer < x 
and the smallest integer > x respectively (0 < x < oo). 
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X*(T, [e RT Y) tends to 1 as T -* oo , but we do not need this fact here.] 
Let us remark here that for a large class of channels (including "mem- 
oryless" channels and "finite state channels"), the capacity C is known 
to be the supremum of a quantity called the "information". In fact 
this equivalence is the essence of the Fundamental Theorem of Informa- 
tion Theory. It will not be necessary, however, to explore this equi- 
valence further. 

2.2 Memoryless Source and Communication With a Fidelity Criterion 

Consider the communication system of Figure 1. The output of the 
source is a sequence of random variables X 1 , X 2 , • • • from an arbi- 
trary subset SC of Euclidean p-space. Assume that these random vari- 
ables are statistically independent and identically distributed with 
probability density function P s (x), x e SC. If we allow impulses in the 
density function, then the X k can be discrete random variables. Say 
that the source outputs appear at a rate of p s per second. The encoder 
waits T seconds (called the "delay") during which time n = p s T sym- 
bols, say X x , X 2 , ■ ■ ■ , X n e SC, have appeared at its input. (Assume that 
p s T is an integer.) Denote the T-second output of the source by the 
random n-vector X = (X, , X 2 , ■ ■ • , X n ) e 9C". 

The channel is defined as above (Section 2.1), so that during the 
T seconds which it takes for the ^-vector X to appear, the channel can 
process an input belonging to the channel input set V? T • It is the task 
of the encoder to assign to each possible source output n-vector X = x, 
a channel input f E (x) tVP T . The channel output is a member Z of the 
channel output set d T , and it is the task of the decoder to assign to 
each possible Z = z an n-vector X = f D (z) t 9C n . Note that the source 
and channel statistics define a joint probability density on the random 
n- vectors X and X. 

Now ideally we would like X = X. But this is most often not possible 
due to imperfections (for example, noise) in the channel. Thus we define 
a fidelity criterion which we use as a measure of the reliability of the 
system. Suppose we are given a non-negative distortion function 
d(x, x) defined on 2C X 9C. Typical choices of the distortion function are 
d(x, x) = | x — x \'(s > 0) when SC is a subset of the reals (that is, the 
dimensionality p = 1), or the "Hamming" distance 
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Fig. 1 — Communication system. 
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d(x, x) = d H (x, £) = 1°' X = *' (5) 

ll, x ?* £, 

where EC is a discrete (that is, countable) set. 

The distortion between the n-vectors, x = (xi , x 2 , • • • , x n ) and 

x = (f , , £ 2 , • ■ • , 4) is 

d (B) (x,x) =n-' Y,d(x k ,x k ). 
1 

Our system performance (fidelity) criterion, which we seek to minimize, 
is 

d = Ed w (X, 1), 

where E denotes expectation (with respect to the joint probability dis- 
tribution of X and X). For a given delay T, which corresponds to 
n = p s T, let d*(T) denote the infimum (with respect to all encoder- 
decoder pairs) of the attainable values of d (for given p s and source- 
channel statistics). Although we usually do not know d*(T) exactly, we 
do know its asymptotic behavior as T — > °o . We proceed as follows. 

For ^ @ ^ 00 , define 31Z(/3) as the set of probability density func- 
tions p(x, £) denned on 9C X 9C which satisfy 

(i) J",,. p(x, £) d£ = P a (x), the source output probability density 

function, 
(w) Jx Js d(x, ^)p(z> ^) dx dx ^ /3. 

The information corresponding to the density p(x, x) e 3TC(/3) is defined as 

/(„(», m - 1 J&, $ log ^feJL dx di< (6) 

where p 2 (£) = fxp(x,£) dx. It is easy to show that J ^ with equality 
if and only if p(x, x) = P.(x)p 2 (x). Finally define the equivalent rate 
of the source 

R e M = mf I{p(x,x)}. (7) 

p(x,i)t3TC(0) 

R eq (fi) is usually called the "rate-distortion function". Note that R eq (0) 
depends only on /3 and P,( x )- 

Let us now return to the quantity d*{T). Shannon's well known 
theorems tell us the following. 2 For a given communication system 
(as in Fig. 1), 
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(») d*(T) ^ d , for all T, (g) 

(m) d*(T) -> d , as 7 1 -> « , 
where d is the smallest solution of 

P.R e M) ^ c, 

and C is the capacity of the channel. 

Some intuitive insight into the meaning of Shannon's theorem can be 
gained by thinking of p,R eq (fi) as the equivalent rate in nats per second 
of the source (when reproduced with distortion /3). It is reasonable 
then to suppose that the minimum attainable distortion d is that 
distortion for which the source rate is just equal to the channel capacity 
C. 

There are two well-known cases for which -R eq (/3) is known explicitly. 
The first is the case where EC = the reals, P s (x) = (27r)~* exp (— x 2 /2a 2 ), 
and d(x, x) = (x — x) 2 . In this case, R eq (@) = \ log o- 2 //3 2 , so that d Q = 
<r 2 exp(-2C/ Ps ). 

The second case (which is important in the sequel) is EC = {0, 1, 2, • ■ • , 
K - 1} (K = 2, 3, • • •), Ps(x) = J^IZl 0-/K)8(x - k)[8(x) is the unit 
impulse], and d(x, £) is given by equation (5). In other words, the source 
output is a sequence of independent random variables, each equally 
distributed on the K-ary alphabet {0, 1, • • • , K — 1}. The quantity 
d is the average fraction of symbols received in error, and is often called 
the "error-rate". In this case, we write d*(T) = P e (T, K), where the 
dependence of P e on K as well as T is indicated explicitly. For this 
case it is known that 2 



R e M = 



log K - h(0) - 13 log (K - 1), p £ ^y± 

K - 1 



(9a) 



> 



K 



0, 

where 

h(fi) = -/3 log p - (1 - 0) log (1 - 0), (0 g g 1). (9b) 
Shannon's theorem, equation (8), tells us that 

P.(T, K) -» y(K, Ps ,C), T -> oo , (10a) 

where y(K, p s , C) is the smallest solution of 

P.R.M ^ C, (10b) 

and C is the channel capacity. A graph of y(K, p s , C) versus C/S s for 



3146 THE BELL SYSTEM TECHNICAL JOURNAL, DECEMBER 1969 







Fig. 2 — 7(K, />„, C) versus C/p s (K-a parameter). 

various values of K is given in Fig. 2. Notice that y(K, p s , C) decreases 
from (K — \)/K to zero as C/p s increases from zero to log K. 

Let us also remark that the quantity P e (T, K) is related to \*(T, M) 
(the smallest attainable average word error probability). In fact it is 
easy to show that 



- \*(T, K") :g P C {T, K) ^ X*(T, K") 
n 



01) 



where n = psT (assumed to be an integer). 

Now, in the general case [arbitrary P s (x) and d(x, £)], it is usually 
not possible to obtain a closed form expression for R eq (@). Theorem 1, 
which is stated below, gives a useful bound on R eq (fi) for the case where 
P s (x) is a density and £ is a bounded set. This theorem is an extension 
of a result of Shannon. 2 The proof is given in Section 3.1. 

Let X be the interval [-A/2, A/2], where 4(0 < A < ») is arbi- 
trary. Let the source outputs X have density P s (x), and let d(x, x) = 
r(x — x), where r(u) satisfies 



(i) r(u) = r(-u), 

(ii) r(u) ^ 0, with equality at u = 0, 
(Hi) r(u) is continuous at it = 0. 



(12) 



Then it can be shown (see Ref. 3, Appendix A) that for < p ^ 
1/A It' An r ( u ) d u > there exists a unique X (/5) which satisfies 



r(u)c 



-Xo(/9>r(») 



-.4/2 



du = p \ A ' 2 fl -X.C«rW du 

•>-.\/2 



(13) 
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Define the probability density g fi (x) on £C by 



iCr) = 



pA/2 
.J -A/2 



-X„(0)r(r) 



dx 



-Xo(/3)r(i) 



[note that / r(x)g p (x) dx = 0], and let 

nA/2 

ffiOS) = - / gfa) log g„(x) dx, 

J-A/2 



(14a) 



(14b) 



be the corresponding "entropy". 

For A = oo , equation (13) has a solution in many cases. In particular, 
when r(u) = \ u | 8 (s > 0), equation (10) has a solution for < /3 < » . 
Thus g (x) and #i(/3) are meaningful for A = °o also. 

We now state the lower bound on R eq (p) as Theorem 1. 

Theorem 1: For the source defined above, for < /3 ^ A' 1 lt% 2 r(u) du, 
fl.,(j8) ^ ff s - HM> (15a) 

where 



H R = - 



P s {x)\ogP 8 (x)dx 



(15b) 



is the entropy of the source density P s (x), and Hi(0) is defined in equations 
(13) and (14). Inequality (15a) a/so Wds /or A = oo, iwAen ?-(w) = | u \' 

(s > 0). 

Examples: 

(i) Say 9C = the reals, and d(x, x) = r(x — x) = \x — x\", where 
s > is arbitrary. Theorem 1 is applicable with A = » . Solving equa- 
tion (13), yields X (/3) = (s/3) -1 and 



g?(x) = 



2 / 3 1/ T(- 



exp[-|* |V(sj8)J, 



so that 
where 



R,M ^ H a - ff,(0), 



HM = - log 

o 



2vr O 



(16a) 



(16b) 



and H s is given by equation (15b). 
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(ii) Quadratic Distortion: Let EC = the reals, and d(x, x ) = (x — £) 2 . 
Then from example (i), with s = 2, 

R*M ^ H s - f log 2reP. (17a) 

Further Shannon 4 has given the following upper bound to B eq (j8): 

B.,08) ^ilog^, /3^o- 2 , (17b) 

where <x 2 = / x 2 P s (x) dx. Note that when P s (x) = (2™')"* exp (-z 2 /2), 
the upper and lower bounds of inequalities (17) coincide for /3 ^ o- 2 . 
[Since i? oq (/3) is non-increasing, R eci (fi) = f or /3 > o- 2 .] 

Another case of interest is 9C = [-A/2, A/2](A < »), P s (z) - A" 1 , 
and d(z, #) = (x — x) 2 . In this case Theorem 1 (applied for finite A) 
provides a lower bound on i? Bq (/3) which is tighter than that of inequality 
(17a) and can be evaluated numerically. An upper bound can be found 
by computing I[p (x, £)], where p (%, $), a joint probability density 
for X and X, is defined by the following: The variate X has density 
P s (x). The variate Jt = a(X + F), where the Y is a Gaussian variate, 
independent of X, with 

EY = and EY 2 = f3A 2 /(A 2 - 12/3), 

and 

a = (A 2 - I2f3)/A 2 . 

Note that E{X — %) 2 = /3. The information I[p (x, $)] corresponding 
to p (x, x) can also be evaluated numerically and is an upper bound to 
R eq (0). Figure 3 is a graph of these bounds on R eq (fi), and also of d , 
the solution of psR e q(d ) = C. 

(in) Say 9C = [-A/2, A/2]. Let P s (x) = A' 1 andd(x, x) = r(a - £) 
where, in addition to satisfying conditions (12), r(u) satisfies 

r(u) = r(y) if u = v (mod A). (18) 

[If, for example, A = 2t and 9C represents an angle, then equation (18) 
must hold.] For r(u) satisfying condition (18), the bound (15a) on 
R eq ((3) of Theorem 1 holds with equality, namely, R eQ = H s — Hi(0). 
(Section 3.1) 

(iv) Threshold Distortion: Let EC = [-A/2, A/2] and let d(x, £) 
be the "threshold" distortion defined by 

d(x, x) = d s (x, £) = r s (x — x), (19a) 
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10-2 




Fig. 3 — Bounds on p/A 2 versus Re n (p) or do/A 2 versus C/ps. (i) 



— upper 

bound; (ii) lower bound (Theorem 1) ; (Hi) lower bound 

(17a). 



where 



fi(u) = J 1, 

to, 



u | ^ 8, 
u I < 5. 



(19b) 



In this case, the bound (15) of Theorem 1 is 

R eq (P) ^ H s - ACS) - log 25 - log (A/28 - 1), (20) 

where /i(/3) is denned in equation (9b). There is a case where inequality 
(20) is satisfied with equality, namely P s (x) = A~ l and A/ (28) = 
K = 1, 2, • • • . For this case, we show in Section 3.1 that 



RM = ■ 



log K n - h({3) - log (/C - 1), 0g^ 
0, 0£ 



ffo- 1 

gp-1 



(21) 



3150 THE BELL SYSTEM TECHNICAL JOURNAL, DECEMBER 1969 

Notice the striking similarity of equations (21) and (9) for the discrete 
K-sxy source. We will have more to say about this later. 

When A/ (28) is not an integer, we show (in Section 3.1) the right 
member of (21) is an upper bound to R e q(&) with K replaced by 
[A/28] + = K+ . Thus with inequality (20), 



25j 



log hn - h <fi) - lo * 



H - 1 1 * RM 



^ log K + - h(p) - log (K + - 1). (22) 

A Result for Finite T for the Threshold Distortion is as follows. 
Let 3C = [ — A/2, A/2], P s (x) = A~\ and d(x, x) — d s (x, x), the thresh- 
old distortion given in equations (19), as in example (iv) above. In the 
system of Fig. 1, let d*(T) = Q(T, A, 8), where the dependence on A 
and 5 is indicated explicitly. The results in example (iv) [equation (21)] 
and equation (10) imply that for A/28 = K , limy-,*, Q(T, A, 8) = 
limr-o, P t (T, K ) = f(K , p s , C). This correspondence between Q and 
P e is extended to finite T in the Theorem 2 (proved in Section 3.2.). 

Theorem 2: Let K + = [A/28] + , K = [A/25]". For all T, 

P.(T, K-) ^ Q(T, A, 8) ^ P t (T, K + ). (23) 

The quantities P„ and Q are defined, of course, for the same channel and 
source output rate p s . 

A case of particular interest is A/28 = K , an integer, so that K + = 
K. = K and Theorem 2 yields 

P.(T, K ) = Q(T, A, 8), all T. (24) 

For this case we deduce from equation (24) that (for all T) the optimal 
encoder for the analog source is a ifo-level "uniform" quantizer with 
quantization levels [(2i — K — 1)5]^, followed by an optimal "digital" 
encoder. This is the only known case for which analog-to-digital con- 
version is known to be optimal for T < <x> for the transmission of analog 
data. 

2.3 Case Where The Source Has No Statistics 

Suppose that the source output is, as in Section 2.2, a sequence of 
symbols from the source alphabet 9C, which appear at a rate of p s per 
second. However, in this case, as distinct from above, we assume that 
there is no known statistical model for the source. Say that, as in Sec- 
tion 2.2, the encoder waits T seconds during which time n = p s T source 
symbols x = (x t , x 2 , • • • , x n ) e 9C" have appeared. Again, as above, 
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the encoder output is / B (x) e "Wj- , the channel output is Z e d T and the 
decoder output is X = / D (Z) t EC". The encoder-decoder pair and the 
channel statistics induce a probability density for X on 9C n which de- 
pends on x (the source output). Denote this density by /(x | x). As- 
suming, as in Section 2.3, that a non-negative distortion function 
d(x, x) on EC X EC is given, then the average distortion when the source 
output is x is 



d(x) - [ ~ E d(x k , x t ) /(x | x) d±, 



(25) 



where x = (xi , x 2 , ■ • ■ , x n ) and x = {x v , x 2 , • • • , x n ). Since we cannot 
take a meaningful statistical average over x, we adopt as our fidelity 
criterion, the "guaranteed" distortion 

d = supd(x). (26) 

Let d*(T) be the smallest attainable value of A for a given delay T 
(which corresponds to n = p s T). 

For the special case where EC = [0, 1, • • • , K — 1} and d(x, x) = 
d„(x, x) [given by equations (5)] let d*(T) = P.(T, K) where the de- 
pendence on K is made explicit. Consider P e (T, K) (the average error- 
rate in Section 2.2). Clearly, 

P.{T, K) ^ P e {T, K). 

The following theorem [taken together with equations (10)] shows that 
as T — > co , P e and P. are asymptotically equal. The proof is in Section 
3.4. 

Theorem 8: For the communication system described above with K-ary 
source alphabet, source output rate p s , and channel capacity C, 

limit P.{T,K) = y(K, p, , C), (27) 

where y(K, p s , C) is given by inequality (10b). 

A second important special case is SC = [—^4/2, A/2], and d(x, x) = 
d t (x, x), the threshold distortion given by equations (19). In this case 
let d*(T) = Q(T, 8, A). The quantity Q can be related to P e , and 
Theorem 4 (proved in Section 3.3) is analogous to Theorem 2, though 
somewhat sharper. 

Theorem 4: For < o ^ A, let .1/(5) be the integer satisfying 

M - 1 g t^ < M. (28a) 
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Then for all T, 

Q(T, A, 8) = P.[T, M (a)]. (28b) 

The quantities P ', and Q are denned, of course, for the same channel 
and source output rate p s . 

In constrast to Theorem 2, this theorem asserts the equality of cor- 
responding values of Q and P e for all values of A/ (2 5). Also as in Theo- 
rem 2, this theorem implies that the optimal encoder for the source 
EC = [ — A/2, A/2], with d = d s [with a fidelity criterion as in equation 
(26)] is a uniform quantizer [with M{8) levels] followed by an optimal 
digital encoder (see part (i) of the proof of Theorem 4). 

Theorems 3 and 4 can be combined to obtain the following. 

Corollary: For < 8 ^ A, let M(8) be as in Theorem 4. Then 

\im Q(T, A, 8) =y[M(8), Ps ,C], (29) 

where y is given by inequality (10b). 

2.4 Generalization to Arbitrary Source Alphabets 

In this section we consider the case where the source alphabet EC 
is an arbitrary space with an arbitrary metric or metric-like function 
defined on it. We then give a generalization of Theorem 4. First we give 
some preliminary definitions. 

Let EC be a set and let p (x, x) be real-valued function defined on 
EC X EC with the properties 

(i) Po (x, x) = Po (x, x) (30a) 

(ii) p Q {x, x ) ^ with equality when x = x. (30b) 

If in addition p (x, £) satisfies 

(Hi) p (x, £) ^ p (x, y) + p (y, x), (30c) 

then p (x, x) is a metric; but we will not require inequality (30c) to 
hold. For x e 9C and A > 0, let S X (A) = {£ t EC : p (x, x) < A} be the 
(open) sphere of radius A about x. 

A set A Q EC is called a "A-covering" (of EC) if {J ztA S X (A) contains EC, 
and A is called a "A-packing" (of EC) if S Z (A) C\ S £ (A) is empty for all 
x, £ e A, x 9* £ . Let M C (A) be the minimum number of points which can 
constitute a A-covering of EC, and let M P (A) be the maximum number 
of points which can constitute a A-packing. These quantities are related 
by the following lemma (proved in Section 3.4). 
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Lemma 1: Let 77 = sup x , y , ttX p (x, y)/[p (x, z) + p (z, y)]. Then jor A>0, 

M C (2„A) ^ il/ P (A). (31) 

In particular, if p is a metric, 77 ^ 1. Inequality (31) is of course 
meaningful only if 7? < 00 . 

Now consider the communication system discussed in Section 2.3 
with an arbitrary source space £C f . Let p satisfy expressions (30a) and 
(30b), and define the "threshold" distortion d s (x, x) by 

,/,,.;, .J 1 . *(*,*)£ 5, (32) 



10, p (x, x) < 8. 

Let a be the guaranteed distortion denned by equation (26) with the 
distortion d(x, £) = d & (x, x) [given by equation (32)]. Finally, let 
(j(T, 8) be the smallest attainable value of d for a system with delay T. 
(The dependence of G on 5 is made explicit.) Of course G{T, 8) also de- 
pends on p 3 as well as the channel characteristics. The special case 
treated in Section 2.3 is EC = [-A/2, A/2], Po (x, £) = | x - f |. In 
this case G{T, 8) = Q(T, A, 8). 

The following is a generalization of Theorem 4 and is proved in Sec- 
tion 3.3. 

Theorem 5: Let M C (A) and M P (A) be as defined above for the source 
alphabet 9C [with a p (x, x)]. Then G(T, 5) satisfies 

P.[T, M P {8)] ^ <$(T, 8) =g P t [T, M c (8)], (33) 

where P. is defined in Section 2.3. Note that P e and G are defined jor the 
same channel and source output rate p s . 

Theorem 5 reduces to Theorem 4 on noting that for 9C = [— A/2, A/2] 
and p (x, £) = | x — £ \, 

M P {8) = M c (8) = M(8), (34) 

where M(8) is defined by inequality (28a). Let us remark that although 
M P = M , the maximum 5-packing is not in general identical to their 
minimum 5-covering. For example, when 8 = A/<i, M(8) = 3, and the 
maximum 5-packing is unique, namely 

2 ' ' 2 



+ To be precise, we must assume that the space X and the encoder and decoder 
functions are measurable. 
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which is not a 5-covering. There are many 5-coverings, for example 

3 ' U '3 

2.5 Some Applications 

2.5.1 Rate at Which Q(T, A, 8) Approaches its Limit 

Consider again the source with 9C = [—A/2, A/2], P a (x) = A' 1 , 
and distortion d(x, x) = ds(x, £) [denned by expressions (19)]. Suppose 
further that ^4/(25) = K , an integer and that the channel capacity 
C = p s log K . In this case y(K , Ps , C) =0 [see expressions (9) and 
(10)], so that from expressions (24) and (10a) 

lim Q(T, A, 8) = 0. 

T-x 

We will now obtain a lower bound on the rate at which this limit is 
approached. From the first inequality in inequality (11), using n = p s T, 

P£T,K) ^ -^\*(T,e CT ). (35) 

Ps* 

For those channels for which expression (4) holds, the right member of 
inequality of (35) ~ (2p s T) -1 . Combining expressions (24) and (35) 
we have that 

Q(T,A, 8) ^ ^[1 +*(DL (36) 

where £(T) — > as T — * 00 . Thus for the class of channels for which 
expression (4) holds and these parameter values, Q(T, A, 8), approaches 
its limit no faster than T' 1 . Determination of the similar bounds on the 
rate of approach of Q to its limit for other parameters is an open question. 

2 .5 .2 The sth-Mea?i Distortion 

Consider the case where £C = [—A/2, A/2], and the distor- 
tion d(x, x) = \x — x\' (s > 0). When P s (x) = A" 1 , let the smallest 
attainable average distortion d*(T) * e{T). For the case of no source 
statistics (as in Section 2.3), let the smallest attainable guaranteed 
distortion d*(T) * t'(T). We establish some properties of e" and e" 
below. 

For any random variable Y (such that | Y | ^ .4), and any 5, , 
5,(0 ^ 8 1} 8 2 S A), 
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S[ Pr { | Y | ^ 8,} ^ E | Y |' ^ 5 2 ' Pr { | Y \ < 8 2 ] 

+ A'Pr {| Y\ ^ 5 2 }- (37) 
It follows from inequality (37) that for arbitrary 6 X , <5 2 (0 ^ <5i , 5 2 ^ A), 
8[Q(T, A, 5,) ^ e'(T) ^ 5:.[1 - Q(T, A, 8 2 )] + ^'QCT, ,4, 5 2 ), 

(38a) 

and 

8:0(r, A, 5.) ^ «'(D ^ «[1 - Q(T, A, «,)] + AW, .4, 5 2 ), (38b) 

where Q and Q are defined in Sections 2.2 and 2.3 respectively. Applica- 
tions of Theorems 2 and 4 (and Q, Q ^ 0) yields 

5jP e (r, K-) ^ ?(T) ^ 8' 2 + AT t (T, tf + ), (39a) 

and 

8lP e [T, Mid,)] ^ e\T) gs;+ A'P e \T, M(8 S )], (39b) 

where K + = [A/25 2 ] + , K. = [A/28,]', and M(8) is defined by inequality 
(28a). Thus I' and e too are related to the digital error rates P. and P. . 
Of course, 5i and 5 2 may be chosen to yield the tightest bounds. 

Examples 

(i) Since we know the asymptotic value of P e and P e as T — » °o, 
we can apply inequalities (39) to obtain estimates of the limiting values 
io = limit r _<x, e (T) and e„ = limit .,._.„ e'(T). For example, when the 
channel capacity C is large, setting A/25 x = exp [(C/p s )(l + A t )] 
and A/25 2 = exp [(C/p s )(l - AOKA, , A 2 > 0), yields, after some 
computation, 

e- = exp {-^ [1 + {.(C)]} , (40a) 

t£ = exp {-^ [1 + fe(C)]} , (40b) 

where £ x , £ 2 -> as C -> <» . Thus for large C, ej and ej decay roughly 
exponentially in C. 

Let us remark that parts of inequalities (40) are obtainable by other 
means. Specifically, V ^ #,(«) exp [-sC/p s ] follows from inequality 
(16). Further, e' ^ exp [-(sC/p s )(l + fc)] and *J ^ exp [-(sC/ps) 
. (i _(_ £ 2 )] can be deduced from the work of Panter and Dite on quanti- 
zation. 5 Finally the bound e' ^ exp [— (sC/p s )(l + £.)] is new. 
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(it) In this example, we apply the first inequality of (39a) to show the 
possible gains (with the sth mean criterion) obtainable by using coding 
in a particular (though quite typical) case. 

Suppose that the channel is the additive white Gaussian noise channel 
with average power P , one-sided spectral density N , with no band- 
width constraint. 6 To begin with, suppose T = l/p s , so that n = 1 
and there is no "coding", that is, each T-second channel input depends 
on exactly one source output. When the source is the K-axy digital source 
(with equi-distributed symbols), it is known that the minimum attain- 
able error rate is lower bounded by* 



PJ.T.K) :> ^[_p« r 



2)(2AT )_ 
where 



(41) 



*(o) = (2x) _i f e" uV2 du, 



is the cumulative error function. 

We now apply the lower bound of inequality (39a) together with in- 
equality (41) to obtain a lower bound on e'(T) when the channel 
signal power P made large, while T = l/p s is held fixed. Setting 
5j = Pq 1 , we obtain from inequalities (39a) and (41) and $(a) ~ 
(2ira)~ 1 exp (-a/2) (as a -> •), that (with T = pS 1 held fixed) 

e -OT = |-(^) 6 exp{-^[l +W P )]} (42) 

where f 8 (P ) — * as P — > • . 

Now suppose that for a given channel (and a given P ) we allow T 
to become large. In other words, we permit "source coding" in blocks of 
length n = p s T. Since the channel capacity C = P /N , we have from 
equation (40a) that 

limit e'(T) = go = exp {-^E 3 - [1 + &(?«)]} , (43) 

where ^(P ) — > as P — > °° . 

Now let 5 > be arbitrary, and let P t be sufficiently large so that 
for P ^ P, , 

I &(Po) I, I ? 4 (Po) I < e. 

Then from inequality (42), with P ^ Pi , the best attainable mean sth 

tThis bound follows from Ref. 1 [equation (82)] when the signal energy nP 
in that reference is replaced by PoT our signal energy, and M is replaced by K . 
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error with no coding is bounded by 

J 'fe) feeXP [-2^ (1+e) ]- <**> 

The best attainable sth error with infinite delay T is from equation 
(43) with P ^ P 1 , bounded by 

< £ exp \-^~ (1 - 0)\ (45) 

We conclude that coding with large delay offers a saving of at least a 
factor of (2s) in power P or rate p s (when P ^ Pi). This of course is 
interesting when s > §. Similar results for s = 2 have been derived by 
Ziv and Zakai. 7 This result can be generalized to arbitrary n (here we 
studied n = 1) and arbitrary channels simply by using appropriate 
bounds on P e (T, K). 

III. PROOFS OF THEOREMS 

3.1 Proof of Theorem 1 and Related Examples 

3.1.1 Proof of Theorem 1 

Shannon [Ref., 2, pp. 155-156] has shown that for a difference dis- 
tortion measure d(x, x) = r(x — re), that 

R e M ^H s - *05) f (46) 

where H s is given by equation (15b) and $(/3) is the maximum attain- 
able entropy H{f(x) } for a probability density f(x) which satisfies 



/: 



r(x)f(x) dx ^ 0. (47) 

The entropy H{f(x)} is defined by 

H {f(x) } = - /" /(*) log /(*) dx. (48) 

A trivial modification of Shannon's argument shows that when 9C = 
[— A/2, A/2], inequality (46) remains valid if f(x) is further restricted 
to satisfy 

f(x) =0, | x | > |- (49) 

Now the density g (x) [defined by expressions (13) and (14a)] satisfies 
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conditions (47) and (49) and has entropy H l 0) [denned by equation 
(14)]. We prove Theorem 1 by showing that if the density fix) satisfies 
conditions (47) and (49), then H{f(x)} g #,(/3). 

Let us write g p (x) = J3e~ Xr(l) where X = X (/3) and where 



B = 

Then 



' (.A/2 -|-1 

e- x ' tn ™ dx\ ■ 



ff.tf) = - f qM log g P (x) dx 

J - All 



'-A/2 

.,4/2 



= -log 5 + X [ r(x)g p (x) dx = -log 5 + X0. 

J -,1/2 

Since fix) satisfies condition (47), 

EM ^ "log ^ + X /" 4/2 r(.r)/(:r) da 

= - J f(x) \ogBe~" rU) dx = -j f(x) log gpix) dx. 



Thus 
II 



{fix)} - HM *-[' fix) log fix) dx + / fix) log ^(.t) dx 

J -A/2 J 

- C > w iog t> «* s C 4l# - *] * - * - * - o. 

where the second inequahty follows from log u $> w — 1. Theorem 1 
follows. 

Note that Theorem 1 will hold for A = «> as long as we can find 
gpix). Examination of the derivation which establishes the existence of 
g p (x) (Ref. 3, Appendix A) shows that Theorem 1 is valid in particular 
f or A = <» and rix) = \ x \°, s > 0. 

3.1.2 Determination of R cq (/3) in Example (m) 

For 9C = [-4/2, A/2], P s ix) = A~\ and dix, x) = r(.r - x), where 
r(w) satisfies conditions (12); H s = log 4. Theorem 1 implies 

R CQ iB) ^logA - HM- (50) 

We now show that if, in addition, r(w) satisfies equation (18), then 
inequality (50) is satisfied with equality. Let X and X be random varia- 



e~ UrW du 

1/2 
-.1/2 

B~ UrM dv. 

n 
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bles such that the density for X is P s (x) = A -1 (| x | ^ A/2), and X = 
X + Y where the random variable Y is independent of X and has density 
9p(y) = Be~ Ur * [denned by equations (13) and (14a)]. The information 
of p(x, x), the joint density for X, X, is 

I\p(x, x)\ = H\p,(x)\ ~ f Ps(x)H\ P (x | s)\ dx, 

■I -A/2 

where p 2 (x) is the density for X, p(x \ x) is the conditional density for 
X given that X = x, and H{ } is the entropy denned in equation (48). 
Now p(x, x) = P s (x)p(x | x) = A _, £e- Xor( '"-'\ so that 

/A/2 ni + A/2 

-A/2 "x-A/2 

when x ^ this becomes, letting v = u — -4 and using equation (18) 

/J/2 j.i + A/2 

e- x " w du + A-'B / 
.-.1/2 J A 

nA/2 nl-A 

= A~ l B / e" x,r(u) du + A~ l B / 

Ji-A/2 "-A/i 

Hence, since / fjp(x) — 1, 

t-A/2 

p.,(x) = A~ l B I c" Xor(H) du = A"'. 

J-.4/2 

For x < 0, a similar proof yields p 2 {x) = A~\ Thus H{p 2 (£) J = log A. 
Further p(x \ x) = g p (£ — x), and a similar use of equation (18) yields 
H{p(x | x)} = Hi(fi), independent of x. Thus we conclude that 
I{p(x, x)\ = log A - Hi (|8). Since p(.r, .f) e 91Z(/3), this and inequality 
(50) imply fl, q (/3) = log A - /AGS). 

3.1.3 Proo/ /or Example (iv) 

We first verify equation (21) for the case A/(25) = K , an integer. 
That i? oq (/3) is greater than or equal to the right member of equation 
(21) follows from inequality (20) (since H s = log A) and from R e M 
^ 0. To show that R etl (fi) is less than or equal to the right member of 
expression (21) we produce a density p (x, x) for which I{p (x, x)} 
equals the right member of equation (21). But first we digress to define 
"entropy" for a discrete random variable. 

Consider a discrete probability density f(x) = ^, a, 8(x — x t ). Then 
the "discrete entropy of f(x) is defined by 

H D [1(x)\ = -2>.- log a, . (51) 
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Now, say that p(x, x) is the probability density for two random vari- 
ables X, 3t, such that X takes values at a countable number of points. 
Then the marginal density for Jt, denoted p 2 (x) and the conditional 
density for Jt given X = x [denoted p(x \ x)] are discrete densities. It is 
easy to show that the information can be written 

I{p(x, £)} = H D {p 2 (x)} - f Vl (x)H D {v>(x I x)\ dx, (52) 

where pi(x) is the marginal density for X. 

Return now to Example (iv). Let 0^/3^ (K — l)/K , and let 
p (x, £) be the density for X, Jt, where X has density P g (x) = A' 1 and 
1t has conditional density p (x \ x) given as follows. Partition the 
interval [-A/2, A/2] into K subintervals {/<}£ 0_1 of width 25. Let x { 
be the midpoint of J, (i = 0, 1, 2, • • • , K — 1). Then for x c /, 

p (x | x) - (1 - /3) 8{x - x x ) + (go ^_ 1} E Kx - x,). 

In other words, X is an imperfectly quantized version of X. With proba- 
bility (1 — /3), Jt is the midpoint of the subinterval in which X lies, and 
with probability /3, It is uniformly distributed among the remaining 
(K — 1) midpoints. Note that P s (x) and p (x \ x) together determine 
p (x, £), and that p (x, x) e 9TC(/3). 

Further, by symmetry, ft is uniformly distributed on the K mid- 
points, so that 

H D {p 02 (x)} = log.K , 

where p Q2 {£) is the marginal density for Jt [corresponding to p (x, £)]. 
Also 

H D {po{& | x)} = h(fi) + /3 log (K - 1), 

independent of x. Thus equation (52) yields 

IlPo(x, £)] = log K - h(fi) - j8 log (X - 1), 

the right member of expression (21). This establishes equation (21) for 
^ /3 ^ (K - 1)/K . Since i2 eq [(^o - l)/^o] = and fl eq (/3) is non- 
increasing, we have R w (0) = for /3 ^ (K — l)/K , establishing 
expression (21). 

It remains to verify the upper bound of expressions (22). But this 
follows immediately on noting that for fixed A and /3, R«,(fl) is a decreas- 
ing function of 8. Thus decreasing 5 to 8' = A/2[A/28] + results in an 
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increase in fl eq (|8). Since A /{2b') is an integer, we can apply expression 
(21) to obtain the upper bound of expression (22). 

3.2 Proof of Theorem 2 

Theorem 2 relates the attainable distortions for a digital source and 
an analog source when connected to a given channel. The proof is in 
two parts [corresponding to the two inequalities in expression (23)], 
the second of which uses a bounding technique introduced by Ziv and 
Zakai. 7 

In part (i) we are given an encoder and decoder for the digital source 
(with appropriate parameters), which when connected to the channel 
as in Fig. 1 results in an average Hamming distortion d = d H • We show 
how to quantize the outputs of the analog source (with appropriate 
parameters) to essentially simulate the digital source. When this 
quantizer is connected to the digital encoder, we show that we attain 
an average distortion for the analog source d s ^ d H . This leads us 
directly to the second inequality of expression (23). 

In part (ii) we establish the first inequality of expression (23) in an 
essentially dual way. We begin by assuming the existence of an analog 
encoder and decoder. We then show how to modulate the outputs of 
the digital source to virtually simulate the analog source. Unfortunately, 
this is not as easy as the quantization in part (i), and we have to make 
use of an "averaging" argument in the course of the proof. 

(i) Let us denote by S a , the analog source whose output is a sequence 
Xj , X 2 , • • • of independent random variables, each uniformly dis- 
tributed on the source space X a = [— A/2, A/2]. The random variables 
appear at a rate of p s per second. For this source we use the distortion 
d{x, £) = di(x, x) defined by equations (19). Assume first that A /{2b) = 
K an integer, and consider the following (uniform) quantizer. Partition 
the interval [-A/2, A/2] into K subintervals {I,}?*" 1 of width {28) 
where 

/. = {e,,e ltl ], i = 0, 1, ••• ,K - 1, (53a) 

and 

e, = (2«)[(t - y)] ' i - 0, 1, ■■-,*. . (53b) 

To be precise, the first interval I should be closed on the left. The quan- 
tizer q is defined by 

q(x) = i, If si/, (~S*£^)« (54) 
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Let us now consider the digital source S d whose output is a sequence 
Si , 8 2 , ■ ' • of independent discrete random variables, each uniformly 
distributed on the iv>ary set 9C d = {0, 1, • • • , K„ — 1}. These random 
variables also appear at p s per second. (Note that we use S k instead of 
X k as in Section II to distinguish the outputs of S d from those of S a .) 
Say that the distortion d = d H as defined in equation (5). 

Suppose that S^ can be connected with delay T to a channel as in 
Fig. 1 with (digital) encoder f^ and decoder fj*, and average distortion 
d„ . We now show how to connect the "analog" source S Q to the 
channel [with the help of f£* and /£"] to attain an average distortion 
d h ^ d H . Consider the system in Fig. 4. In T seconds the output of 
the analog source is an ??-vector (n = p s T)X = (Xj , ■ • • , X„). The 
"quantizer" output is the ?i-vector S = (Si , S 2 , • • • , S n ), where S k = 
q(X k ) (k = 1, 2, • • • , n). Note that the S k are independent and uniformly 
distributed on {0, 1, • • • , K Q — 1}, as are the outputs of the digital 
source S (/ . The digital encoder and decoder f£ ' and ffi are as given 
above, and the output of the latter is the -K" -ary vector S = 
(A, ••• , A). Thus 

Ed M (S,S) = d„. 

The "converter" output is the n-vector X = (a, , X 2 , • • • , A n ) where 

1, = (2 A - Ko + 1)5. 

In other words if A = h then Jt k is the midpoint (e, + e, + i)/2 of the 
ith. subinterval. Disregarding the case when X k is equal to one of the 
endpoints e, of the subintervals, (an event with zero probability), it is 
clear that | X k — 
Thus 



X k I ^ 5 if and only if S k ^ S k (k = 1,2, 



n). 



It follows that 



d s = Edl"\X, X) = Ed^ (S,S) = d H . 



<yr. -1. o> < i'.(T,£j , 



(55) 



when A/(2d) is an integer. The second inequality of expression (23) 
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Fig. 4 — An analog communication scheme. 
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follows on noting that Q(T, A, 5) is a nonincreasing function of 5. Thus 
decreasing 5 to 5' = A/2K+ does not result in a decrease in Q(T, A, 5). 
Since A/ (28') is an integer, we can apply inequality (55) to obtain the 
second inequality of expression (23). This completes part (i). 

(ii) Let us suppose that the analog source S a denned in part (i) is 
connected with delay T to a channel as in Fig. 1. The T-second source 
output is the ?i-vector X = (X, , X 2 , • • • , X n ) and the decoder output 
is the n-vector X = (X t , X 2 , • ■ ■ , X n ). Say we attain an average dis- 
tortion 

d s = Edl n) (X,X). 

Letting E[d\ n) (X, X) | X = x] be the conditional expectation of d\ n) (X, X) 
given X = x, we can write 

d b = f E[dl n) | X = x] -^ dx. (56) 

J |- A/2, 4/2] " ' l 

Suppose that (A/25) = K , an integer. Let us partition the interval 
[-A/2, A/2] into K subintervals of width 25 as in equations (53). 
Let 8 be the set of left end-points of these subintervals, that is, 

Now consider the re-cube [—A/2, A/2] n . Note that the random n-vector 
X is uniformly distributed on this cube. The partition of the interval 
[—A/2, A/2] defines a partition of the n-cube into K n Q subcubes, each 
the product of n subintervals. Let the members of S" be denoted by the 
n-vectors & , j = 1, • • • , i££ , and let C, be the corresponding subcube. 
(That is, Cj is the product of the subintervals whose left end-points are 
the coordinates of ?,- .) Then clearly, 

_A A - Tc 

O > 9 — 4- 1 ' ' 
where ^ denotes disjoint union. Thus we can rewrite equation (56) as 

I = Z [ -jiEldF |X = x\dx 

i - 1 . 7 n : A 



»=] 



= E 4 [ tA" M<tf° [ X = ?,- + «] da, (58) 

where the second equality follows from the change of variable of integra- 
tion to a = x — £,• , and the fact that A = 2hK . 

Some insight into what we have done may be gained by considering 
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the special case where K = 2 and n = 2. In this case the n-cube 
[-A/2, A/2] n is a square, and there are Kl = 2 2 = 4 members of S n 
denoted £i , £ a , £s , and £ 4 ■ (See Fig. 5.) The subcubes are C t , C z , C s , 
and C 4 as indicated. 

Let us consider now the digital source S d denned in part (i) whose 
output is the sequence S t , S 2 , • • • . We would like to transmit the out- 
puts of S d through a channel (as in Fig. 1) with delay T, so that the 
source output must be an n- vector (n = psT)S, and the decoder output 
an n-vector S. The fidelity criterion is 

d H = W(s,S). 

Now suppose that we are given an encoder-decoder, /jj a> , /£°, for the 
analog source S [for which A/ (2 8) = K ], connected with delay T, to 
a given channel. Say this encoder-decoder attains an average distortion 
d s . We show that there exists an encoder-decoder for the K Q -&ry 
digital source S d , connected with delay T, to the same channel such that 
the average distortion d H ^ dt . From this we deduce immediately 
that for A/ (28) = K , 



P.(T, K ) ^ Q(T, A, 8). 



(59) 



The digital encoder is given schematically in Fig. 6a. The analog en- 
coder which we are given is /b o) (x), x e [—A/2, A/2] n , and is realized in 
the right box of Fig. 6a. The function of the "modulator" is to assign to 
each w-vector s t {0, 1, • • • , K — 1 } n , a member of [—A/2, .4/2]". This 
is done as follows. Let 8 be the set denned by equation (57). For s e 
{0, 1, 2, ••• ,K - l},let 
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Fig. 5 — A digital encoder. 
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Fig. 6— (a) Digital to analog encoder, (b) Analog to digital decoder, (c) Digital 
communication scheme. 



g(s) = (25) 



s — 



K 



be the sth member of S. For s = (s x , s 2 , 
let 



,s n )t{0, 1, ••• ,K - 1}", 



(n) (s) = [#(si), g(s 2 ), ••• ,g(s n )]. 
When the input to the modulator is s, its output is 

« + g M (s), 



where a = (a : , a a , 
encoder is 



, cO e [0, 25]" is a fixed vector. Thus the digital 



/Hs) = /n« + g w m 



The digital decoder is given schematically in Fig. 6b. The left box is 
the analog decoder /£° which we are given. Its output x is a real n- 
vector. The right box is a quantizer. When its input is x = (ii , • • • , £„), 
its output is g,(x) = s = ($! , • • • , s„), where s k (k = 1, 2, • • • , n) is a 
member of {0, 1, • • • , K — 1} which minimizes | g k (s k ) + a k — x k \. 

When the digital source §> d is connected to the channel with this en- 
coder-decoder pair, the result is schematized in Fig. 6c. (Upper case 
X's and S's are used to signify random variables.) The portion of the 
system in the dotted lines is precisely the analog encoder-channel- 
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decoder which would produce an average distortion d s given bj r equa- 
tion (56), if the analog input, X, where uniformly distributed on the 
n-cube [—A/2, A/2] n . But this is not the case here. In fact, X takes only 
one of Kl possible values. However, the quantity E[d\ n) (X, X) | X = x] 
is exactly the same in the system of Fig. 6c as in equation (56), for x = 
(n) (s) + «(s £ {O, ••• ,K - 1}"). _ 

Let us write an expression for the average distortion d n for the 
digital source. Note that S k ?* S k , only if \£ k — [g(S k ) + a k ] | ^ 8. 
Thus 



and 



<# } (s,s) ^ d{"V B, (S) + «,xl 



d H = EduiS, S) 



^ E^Idi^X, X) | X = g (n) (s) + «], (60) 

where ^ B is the sum over the K n equally-likely values of s. Let us now 
average the right member of expression (60) over all a in [0, 25]", with 
a assumed to be uniformly distributed. That average is 



/ 



7™ E 4 tf[d,(X, X) | X = g M (s) + «]. 



If we note that the set [g (s)} are in one-to-one correspondence with 
the K% members £,- of 8", this quantity may be written as 

2 Tf= f To^r E ^ {X ' X) 1 X = ?,- + a] da, 

,=1 Ao .'(0.251" W 

which equals d s by equation (58). Since there must be at least one 
value of a for which the right member of expression (60) is as small 
as the average, we have proved inequality (59). 

The first inequality of expression (23) follows from inequality (59) 
on noting as in part (i) that Q(T, A, 5) is a decreasing function of 8. 

3.3 Proof of Theorems 4 and 5 

Since Theorem 5 includes Theorem 4 as a special case we need only 
give a proof of Theorem 5. Our task is further simplified since the basic 
idea of the proof of Theorem 5 is the same as in Theorem 2 (Section 3.2). 
Here too we break the proof into two parts. In part (?) we assume that 
we are given an encoder-decoder for the digital source and deduce the 
existence of an encoder-decoder for the general source (which plays the 
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part of the analog source in Theorem 2). In part (ii) we do the opposite. 
However we do not have the complications here which necessitated an 
averaging argument in Section 3.2. 

(/) We prove here that G(T, 5) ^ P e [T, M c (8)], the second inequality 
of expression (33). The proof parallels that of part (i) in Section 3.2. 
Instead of the analog source space £C a we have here a general space 9C. 
The distortion is d t (x, x) with | x — x j replaced by p (x, £). 

To transmit the source outputs which belong to 9C we use the system 
in Fig. 4. The digital encoder-decoder is for a K -&Ty source where 
K = M c (8). We assume that it attains a guaranteed distortion d„ . 
The quantizer is denned as follows. Let {ft}?'" 1 be a minimum 5-cover- 
ing of 9C. For x 1 9C, let q(x) be the smallest t(0 S i ^ K - 1) such that 
x e S fil (6). Then if x = (a;, , x 2 , ■■• , x n ) t SC n is the source output, the 
quantizer output is s = q (x)^ = [qfa), q(x 2 ), • • • , q(x n )]. The output 
of the digital decoder is S = (Si , £ 2 , • • • , £„) and the converter output 
is t = & , • • • , X n ), where X k = 0,- when S k = ». Clearly, if S k = 5, , 
then po(Z t , A 7 *) < o. Thus for any source output x, 

&(x) ^ <W°(x)l ^ d w , 

so that the overall guaranteed distortion d 5 ^ d„ , from which part (i) 
follows. 

(it) We prove here that P e [T, M P (5)] ^ G(T, 8), the first inequality 
of expression (33). As in part (i), the proof of part (ii) parallels that in 
Section 3.2. Again 9C„ is replaced by EC and | x - £ | by p Q (x, x). 

As in Section 3.2, we assume that we are given an encoder-decoder 
for the general source with guaranteed distortion d s . We set K = Afp(8) 
and use the system of Fig. 6 to transmit the outputs of the K -ixry digital 
source. The modulator is defined as follows. Let (ftlf-o 1 be a minimum 
5-packing of 9C. If source output is s = (s t , s 2 , • • ■ , s„), then the modula- 
tor output is g in) (s) = 03., , j3„ , • • • , /3.J. The output of the decoder is 
X = (J?i , • ■ • , X n ), and the quantizer output is S = (Si , S 2 , • • • , o„), 
where & = i when X* e S, f (a). If X t 4 S 0i (8) for all i (0 <i£K - 1), 
then S k = 0. 

Clearly, if p„(A\ , A' t ) < 5, then & = & . Thus for any source output 
s, the conditional expectation 

d„(s) ^ 4(y w (»)] ^ A • 

Thus the overall guaranteed distortion is d H ^ d s , completing the proof 
of part (ii) and the theorem. 
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3.4 Proofs on Packing and Covering 

In this section we give a proof of Theorem 3, the main part of which 
is a lemma on covering of the if-ary n-cube. We also prove Lemma 1 
relating packing and covering in Section 2.4. 

3.4.1 Proof of Theorem 3 

We first establish the following lemma. 

Lemma 2: Let 0(0 < 6 < (K — 1)/K) be arbitrary, and let r satisfy 

R e «(0) < r < log K, 

where R eQ (X) is the equivalent rate for the "K-ary source given by expressions 
(9). Using the terminology of Section 2.4, let 9C = {0, 1, • • • , K — 1}* 
(the K-ary n-cube) and p (x, x) = d\f(x, x). Then for n sufficiently large, 
there exists a 6-covering of 9C with M = e rn points. 

Proof: Let {x.jf be a set of K-ary n-vectors. Let /^(x, , x 2 , • • • , x M ) 
be the number of members x of SC such that d^fa , x,) ^ for all 
i = 1,2, • • • , M. If F = 0, then {x,} " is a 0-covering of EC. We can write 

F(x { , •■■ , x u ) = S $( x > x, , • • • , x M ), 

xzX 

where 

-,- v v |l if <4 n, (x, , i) ^ 0, all i= 1,2, ■■■ , M, 

*V*» x i . " " " > *if) — -j 

10 otherwise. 

Now consider an experiment in which M = e rn n-vectors {X,}f are 
chosen at random from SC independently with identical (uniform) dis- 
tribution 

Pr {X,- = x} = K~ n . 

Then F(X, , X 2 , • • • , X M ) is random variable with expectation 

EF = 2>*(i,X, ,X 2 , ... ,X M ), 

ittC 

where, as indicated, E$is computed with x held fixed. Now for a given x, 
WX, , •■• ,X W ) = Pr {<*• = 1) 

= Pr /S ld?(x, X.) ^ 0,| 

i-l 

= [Pr \d?(i, X.) ^ »n*. 
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where the last equality follows from the independence and identical 
distribution of the random vectors {X,}. Letting a n = X/os/<»« ( •)• 
(K — 1)'K~" be the probability that d,,(t, X.) < On, we have 

E*(±, X, , • • ■ , X M ) = (1 - a n ) M ^ e—", 
independent of x. Thus 

EF ^ Me-''". 

Now it is well known (see for example, Ref. 8, p. 173) that for < 
< (K - 1)/K, as n -> °° , 

a n ««-■•«<»*•«. 

Thus since M = 2 rn and r > R eq (d), 

E(F) £ Me-"" M = e rn exp j_^-*^«i»*»Wj _> , as n -> °o . 

Now, there must be at least one particular set {x,}f such that 

F(xj , x a , • ■ ■ , x u ) £ EF. 

Thus if we choose n large enough so that E(F) < 1, F(x t , • • • x^) = 
(since F is an integer valued function). Thus {x,-}" is the required 
covering. 

The proof of Theorem 3 now foDows the standard proof of a source- 
channel coding theorem, with Lemma 2 playing the role of the source 
coding theorem. (See Ref. 2.) Roughly speaking the proof is as follows. 
When y(K, p s , C) = (K — l)/K, the entire theorem is trivial, since 
we can attain a guaranteed distortion of (K — \)/K without even using 
the channel by simply letting the decoder outputs take the value 
i(Q £ % ^ K — 1) with probability 1/K. Thus assume that ^ y < 
(K - l)/K. 

The channel can transmit e RT (where R < C, the channel capacity) 
in T seconds with arbitrarily high reliability (see Section 2.1). By the 
definition of y = y(K, p s , C) [expression (10b)], 

R.M ^ c/ Ps . (61) 

Let e > be arbitrary. In Lemma 2, let r = C/p s — e x , where £i > 
will be chosen below. Then approximate the T-second source output 
(a Ko-ary n-vector, n = p s T) by a (covering) set with e Tn = e TPaT mem- 
bers. Since rp s < C we can transmit these n-vectors through the channel 
with arbitrarily high reliability. Further, with e > arbitrary, and if 

r > R aq (y + e), (62) 
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we have from Lemma 2 that the error in making the approximation 
will always be less than or equal to (7 + e) for T sufficiently large. In 
fact, if we set 



e, - RJy) - fl eq ( 7 + |) > 



[since #(7) as defined in equation (9) is strictly decreasing for 7 < 
(K — l)/K], then [using inequality (61)] 

r = — - e, - — - BJy) + Rjy + 5 

Ps Ps N 



^ ^(y + |) > Ke„(7 + «) 



and condition (62) is satisfied. 

We conclude that for T sufficiently large, we can make 

&11 ^ 7 + e 
for arbitrary e > 0. Thus 

limit P e {T, A') ^ 7 + e -> 7, as e -» 0, 

which is Theorem 3. 

3.4.2 Proo/ 0/ Lemma 1 

We say that A C 9C is a "maximal A-packing" if A is a A-packing, 
and if for all v 4. A, the union {v} KJ A is not a A-packing. We establish 
Lemma 1 by showing that every maximal A-packing is a (27?A)-cover- 
ing. Let A be a maximal A-packing. If A is not a (2ijA)-covering, then 
there exists a v e 9C such that p (w , w) > 2r?A, for all u z A. From con- 
dition (30b), v i A. We claim that [v ] U A is a A-packing, con- 
tradicting the maximality of A. If 10 c *S,„(A), then for all u t A (using 
the definition of i)) 

Po(v , u) ^ r?[p (y , w) + po(io, u)], 

so that 

*<«, «) S ^^ - *>(». , «) > ^ - A = A. 
Thus w 4 «S„(A) and {v } U A is a A-packing, establishing the lemma. 
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APPENDIX 



List of Symbols 

EC the source output space 

P s (x) the source probability density function 

p s source output rate (symbols per second) 

Xi (Xi e 9C) the *th output of the source 

x (a*i , x 2 , • • • x n ) £ £C" 

■W T set of "allowable" channel inputs 

3 r the set of all channel outputs 

T the coding delay 

n = Ps T 

f E (x) the encoding function, f E (x) e W r 

f D (z) the decoding function, / D (z) e 9C" 

x the decoded n-vector, x = f D (z) t SC" 

A^ number of code words in a code 

R = 1/T log N, the rate of a code 

X the word probability of error 

\*(T, N) smallest attainable word error probability for a code 

with parameters JV and T 

X average probability of error 

d(x, x) the distortion function 

(0, x = x 

d„(x, x) = -^ 

[l, x ^ x 

d"(x, x) = l/n 2l = 1 d(x K , x K ) 

d = Ed°'\x, x) 

d*(T) the smallest attainable d for a given delay T 



d 6 (x, x) 



1 I x - x I < 8 



I x - x I > d 



cl(x) the expectation of d n (x, x) given x 

d = SUp xca ;n d(x) 

d*(T) the smallest attainable value of d for a given delay T 

Q(T, A, 8) d*(T) for d t (x, x) and x t [-A/2, A/2] 

Q(T, 8, A) d*(T) for d s (x, x) and x t [-A/2, A/2] 

P t (T, K) the minimum attainable per symbol error rate for an 

equiprobable K-axy memoryless source 
y(K, p s ,C) = lim r . M P.{T, K) ' 
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C the channel capacity 

P e (T, K) the minimum attainable guaranteed per symbol error rate 

for a if-ary source 
G(T, 5) generalization of Q, denned in Section 2.4 
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